Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="V6hQN/VNoz/peVae"


--V6hQN/VNoz/peVae
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andre Esser writes:
> The parameter selection process of Classic McEliece takes the
> asymptotic runtime exponent of Prange and chooses the code parameters
> such that it approximately matches the respective AES-keysize, i.e.,
> 128, 192 or 256.

No, that's not at all how the Classic McEliece parameters were chosen.
The parameter sets were explicitly selected as follows:

   * 348864: "optimal security within 2^18 bytes if n and t are required
     to be multiples of 32".
   * 460896: "optimal security within 2^19 bytes if n and t are required
     to be multiples of 32".
   * 6688128: "optimal security within 2^20 bytes if n and t are required
     to be multiples of 32".
   * 6960119: same without the multiples-of-32 restriction.
   * 8192128: "taking both n and t to be powers of 2".

Parameter optimization for any specified key size is simpler and more
robust than trying to work backwards from comparisons between McEliece
and unrelated cryptosystems. There's no inherent power-of-2 requirement
for the key sizes, but 1MB, 0.5MB, 0.25MB are easy to remember.

The quotes above are from the submission document that presented the
current list of proposed parameters:

   https://classic.mceliece.org/nist/mceliece-20190331-mods.pdf

Scientifically, this parameter-selection strategy for McEliece goes back
at least to https://eprint.iacr.org/2008/318. See the paragraph in that
paper beginning "For keys limited to", in particular obtaining n =3D 6960
=66rom a 1MB limit.

The underlying security evaluations have always been explicitly based on
concrete analyses, _not_ asymptotics (even though asymptotics are
helpful for assessing security stability). This was already emphasized
in Section 8.2 of the original submission document:

   https://classic.mceliece.org/nist/mceliece-20171129.pdf

The section begins as follows:

   We emphasize that o(1) does not mean 0: it means something that
   converges to 0 as n =E2=86=92 =E2=88=9E. More detailed attack-cost evalu=
ation is
   therefore required for any particular parameters.

That section continues by pointing to the 2008 paper mentioned above,
https://eprint.iacr.org/2008/318, as the source of n =3D 6960. Anyone
checking that paper sees that the paper obtained this value of n via a
concrete analysis of that paper's attack, not from asymptotics.

Furthermore, that attack is noticeably faster than Prange's algorithm.
Asymptotically, this cost difference disappears into a 1+o(1) factor in
the exponent, but the parameter selection has never relied on any such
simplification.

To be clear, it's not that all of the parameter details are from 2008.
For example, to simplify KEM implementations, the Classic McEliece
parameter choices avoid list decoding (which would otherwise allow 1 or
2 extra errors). More importantly, the smaller parameter sets were not
in the original 2017 proposal; they were added in 2019, in response to
NIST making clear in 2019 that it wanted smaller options.

For any particular parameter set, evaluations of the Classic McEliece
parameter proposals using the 2008 scripts and various post-2008 scripts
show some differences, mostly minor differences regarding exactly which
attack overheads are counted and exactly which attacks are covered. The
largest quantitative differences come from the gaps between free memory
and realistic models.

The Classic McEliece team filed an OFFICIAL COMMENT years ago requesting
that NIST "fully define the cost metric to be used" for NISTPQC, so that
all submission teams could evaluate costs in this metric:

   https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/EiwxGnfQgec/m/Lq=
ckEVciAQAJ

In the absence of action by NIST to settle on a metric for NISTPQC, the
Classic McEliece team filed another OFFICIAL COMMENT in November 2021
with numbers from a recent estimator for all proposed parameter sets:

   https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/dN_O0rvsLV4/m/QZ=
4UjtxnAwAJ

For example, that estimator says 2^279.2 for the 6960119 parameter set,
in line with the expectations stated in the original Classic McEliece
submission in 2017. This is not a surprise, given how stable the
landscape of attack algorithms has been.

> The security analysis then relies on the not quantified statement that no=
=20
> algorithmic improvement over Prange needs to be considered because in
> a real attack the memory access costs outweigh the improvement.

No. Section 8.2 of the 2017 submission document

   https://classic.mceliece.org/nist/mceliece-20171129.pdf

started from the 2008 numbers (which are already better than Prange),
explicitly considered subsequent algorithms (see also Section 4.1 for
references), observed that the 2008 algorithm and subsequent algorithms
were bottlenecked by memory access, and stated the following regarding
the recommended 6960119 parameter set:

   We expect that switching from a bit-operation analysis to a cost
   analysis will show that this parameter set is more expensive to break
   than AES-256 pre-quantum and much more expensive to break than
   AES-256 post-quantum.

Adequate cost quantification wasn't in the literature at that point, but
is now readily available from the November 2021 OFFICIAL COMMENT:

   https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/dN_O0rvsLV4/m/QZ=
4UjtxnAwAJ

The submission has always stated that "ISD variants have reduced the
number of bit operations considerably below 2^256" for 6960119, so the
category-5 assignment for this parameter set relies on accounting for
the costs of memory. For people who want category 5 while ignoring
memory costs, the 8192128 parameter set has always been fully specified
and implemented as part of the Classic McEliece proposal.

> While we would not agree that a processor with AES hardware
> acceleration is a particularly suboptimal way of attacking AES

Here is one way to see that an AES attacker using the same 7nm chip
technology can do _five orders of magnitude_ better than the paper's AES
attack:

   * https://www.ant-miner.store/product/antminer-s17-56th/ describes
     real Bitcoin mining hardware carrying out 56 terahashes/second at
     2520 watts, i.e., 45*10^-12 joules per hash. If these hashes are
     full Bitcoin hashes, 2xSHA-256, then they are roughly the same cost
     as 16xAES-128, so a similar AES attack box would use roughly
     3*10^-12 joules per AES-128.

   * The paper in question, https://eprint.iacr.org/2021/1634, reports
     (in Section 7) 2.16*10^9 "AES encryptions per second performed by
     our cluster". The cluster has four EPYC 7742 CPUs (according to
     Section 4). The power consumption of the cluster isn't reported,
     but presumably is on the scale of 1000 watts, i.e., on the scale of
     500000*10^-12 joules per AES-128.

An easier analysis considers AT rather than energy. Each 64-core EPYC
7742 CPU has 32 billion transistors, meaning that each CPU core has half
a billion transistors:

   https://hexus.net/tech/reviews/cpu/133244-amd-epyc-7742-2p-rome-server/?=
page=3D2

The AES attack in question uses the AES instructions, which, on each of
these cores, can at best carry out two parallel AES rounds per cycle
according to the Zen2 AESENC figures in Agner Fog's tables:

   https://agner.org/optimize/instruction_tables.pdf

Each round uses a few thousand bit operations, with a small number of
transistors per bit operation, accounting for only a tiny fraction of
the transistors in the CPU. Standard key-search circuits instead have
almost all transistors performing cipher operations at each moment, with
minor overheads for key selection and comparison.

> but so are for ISD attacks.

Quantitatively, compared to their AES hardware, Intel and AMD put _much_
more of their hardware into optimizing memory access---a serious chunk
of each core, plus extensive off-chip resources---for obvious reasons.

The point here isn't that it's impossible to use special-purpose
hardware to streamline memory-intensive attacks. The point is that a
claim regarding the quantitative costs of two attacks, namely AES key
search and a memory-intensive ISD attack, was comparing time but
neglecting to account for vast differences in the hardware resources
used by the attacks. This is not a meaningful security comparison; it
does not correctly predict what large-scale attackers can do.

> But here you are asking *us* to provide the formalisms on which you
> seem to have based the security of the parameter sets. =20

No. The Classic McEliece team asked "NIST to fully define the cost
metric to be used for 'categories'". This is not something that should
be decided ad-hoc for particular attacks.

The submission has always explicitly advocated accounting for the real
cost of memory ("switching from a bit-operation analysis to a cost
analysis"). If it turns out that NIST asks for category 5 with free
memory: as noted above, the 8192128 parameter set has always been
available.

> We are modeling the *amortized* memory access cost.

All of these algorithms are bottlenecked by large-scale data motion for,
e.g., sorting b-bit chunks of data, where b is small. It is physically
implausible that moving b bits of data large distances through an array
of size 2^80 can be as cheap as 80b bit operations.

> Of course, by the inherent limitation given by our hardware constraints t=
he=20
> data points for extrapolation are selected from a limited range.

The hardware does not require this selection: the same machine offers
lower-cost caches (and higher-cost storage beyond DRAM). Real graphs of
memory-access cost such as

    https://i0.wp.com/chipsandcheese.com/wp-content/uploads/2022/06/image-4=
=2Epng
    https://i0.wp.com/chipsandcheese.com/wp-content/uploads/2022/06/image-1=
=2Epng
    source: https://chipsandcheese.com/2022/06/07/sunny-cove-intels-lost-ge=
neration/

are forced by distance constraints to be worse than square-root lines in
log space, and are then bumped to locally horizontal line segments (L1
cache, L2 cache, etc.) to simplify the engineering. Taking experiments
within one of those horizontal line segments, and then extrapolating
horizontally from this selection of data (or logarithmically, which on
such a small scale is difficult to robustly distinguish from
horizontally), gives incorrect predictions for larger sizes.

---D. J. Bernstein (speaking for myself, beyond the quotes)

--V6hQN/VNoz/peVae
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE3QolqQXydru4e4ITsMANTjsOVFkFAmKoCQsACgkQsMANTjsO
VFnieA/+KoUAjG/n13NswLdKBR/jjngRFWsk16/GZqzkOLYFUlihu+apFOUMPCBd
UR1SSfuMCE/s6lmbUO4LbOpFajsP2+iKSHEzEMjhOypkhVP3QtcP3HNnyrY+m1pK
Gy6NXpP9/n17JS0MVoa8WiP+D8Xvkig4/S0AiuuQLtblBpU5eX+RVxi5ei0yGa3Y
d19Ck0dw+TrbdpYkTC4pkBl+93r9XWSKnjZvBQ1rYccNk1ZorT9ThmShwVxIk7hj
YQ01Ns4TTox8F3cWiVfjxPMllkc7KtnKuqxAQgaS8rD14bAx5w47mGs8i72/OPVq
4WsO1iS3MLlv6ErCllHpb4jm4itZxwzd8fFv2skvk9zughzrXnRU6KU+uZ9hdzpS
uYbWiegu/3EV7SLMjXhCL1wTeHTKSl60M64GPoQxYGWcShycoKOUXlC+XgS6pYg0
B56Gs1C6RMJspBzBVmvL3vmaVW0VpCk35gUeuZq0l2/xqyOVFNg14CQJXFlohPaU
zVsLa+cngtzCAJbJBKG5CwAfE7osTuu5OsVoWZ8MjeoioiTaTg2B/nL2VAr1kG0L
tv14XR/xuYBtL5NjI4WFBe2VnW5npS/wiD7F1ArU+edPahWkyFrsYWUDsInzWgjE
1FUGKJeOW+KoH1cm+rcfM7AjC3fm6U64fXQQgAgoW+wBLFBxmDU=
=sUw0
-----END PGP SIGNATURE-----

--V6hQN/VNoz/peVae--